Big Data to Knowledge—Harnessing Semiotic Relationships of Data Quality and Skills in Genome Curation Work
نویسنده
چکیده
This article aims to understand the views of genomics scientists with regard to the data quality assurances associated with semiotics and Data-Information-Knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic DQ dimensions were ranked the highest among all other semiotic data quality dimensions, which indicated that scientists spend great efforts for handling data wrangling activities in genome curation work. Semantic and pragmatic related sign communications were about meaningful interpretation, thus required additional adaptive and interpretative skills to deal with data quality issues. This expanded concept of ‘curation’ as sign/semiotic was not previously explored from the practical to the theoretical perspectives. The findings inform policy makers and practitioners to develop framework and cyberinfrastructure that facilitate the initiatives and advocacies of ‘Big Data to Knowledge’ by funding agencies. The findings from this study can also help plan data quality assurance policies and thus maximize the efficiency of genomic data management. Our results give strong support to the relevance of data quality skills communication in relation to data quality assurance in genome curation activities.
منابع مشابه
Domain knowledge and data quality perceptions in genome curation work
Purpose-This article aims at understanding genomics scientists' perceptions in data quality assurances based on their domain knowledge. Design/methodology/approach-The study used a survey method to collect responses from 149 genomics scientists grouped by domain knowledge. They ranked the top-five quality criteria based on hypothetical curation scenarios. The results were compared using Chi-Squ...
متن کاملPrioritization of data quality dimensions and skills requirements in genome annotation work
The rapid accumulation of genome annotations, as well as their widespread reuse in clinical and scientific practice, poses new challenges to management of the quality of scientific data. This study contributes towards better understanding of scientist perception and priorities for data quality and data quality assurance skills needed in genome annotation. Our study was guided by a previously de...
متن کاملStudy of the foundation, models and issues of research data curation and management in scientific and academic environments
Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...
متن کاملThe BIG Data Center: from deposition to integration to translation
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, ...
متن کاملGenomics Data Curation Roles, Skills, and Perception of Data Quality
Compared to a decade ago, genomics scientists, driven by technical changes and availability of massive genomic data, are performing a wider plurality of curation roles including those of end-users, curators, or dual-role users. Scientists with different curation roles (including that of end user) may focus on different data quality aspects and skills requirements in a community curation environ...
متن کامل